A toy GWAS dataset is made available along with the package. Let’s look at the dimensions, head and tail of the dataset.
library(ggman)
dim(toy.gwas)
## [1] 21751 7
head(toy.gwas)
## chrom snp bp pvalue beta or gene
## 1.1 1 rs1_0 161003087 0.29540 0.099845335 1.1050 GENE1
## 1.2 1 rs1_5 55542379 0.56240 0.037295785 1.0380 GENE1
## 1.3 1 rs1_10 166549115 0.07658 -0.112049504 0.8940 GENE1
## 1.4 1 rs1_15 78291020 0.61850 0.044973366 1.0460 GENE1
## 1.5 1 rs1_20 40771489 0.58600 0.039220713 1.0400 GENE1
## 1.6 1 rs1_25 30693405 0.89610 -0.008536331 0.9915 GENE1
tail(toy.gwas)
## chrom snp bp pvalue beta or gene
## Y.21746 Y rs22_973 21755931 0.76200 0.022739487 1.0230 GENE2177
## Y.21747 Y rs22_978 32781818 0.56720 -0.050346374 0.9509 GENE2177
## Y.21748 Y rs22_983 27958741 0.97060 0.002995509 1.0030 GENE2177
## Y.21749 Y rs22_988 26187172 0.05613 -0.121602822 0.8855 GENE2177
## Y.21750 Y rs22_993 23036298 0.82370 -0.014200349 0.9859 GENE2177
## Y.21751 Y rs22_998 31961908 0.17560 -0.167117723 0.8461 GENE2177
To create a Manhattan plot, only the first 4 columns (chrom,snp,bp,pvalue) are required. Specific preformatting of the column classes is not required. The chromosome identifiers can be either numbers (1,2,3..) or strings(“Chr1”,“Chr2”..).
ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue")
By enabling the relative positioning, the base pair positions will be scaled in proportion to the real genome positions. Hence, the gaps with no SNPs can be visualized. Be default this is not enabled. To use the relative positions, use the option relative.positions = TRUE
ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE)
Specific set of points in the plot can be annotated by providing a data.frame with only the SNPs those need to be labelled. Let’s take a subset of the main data frame toy.gwas.
#subset only the SNPs with -log10(pvalue) > 8
toy.gwas.sig <- toy.gwas[-log10(toy.gwas$pvalue)>8,]
# dimensions
dim(toy.gwas.sig)
## [1] 4 7
#head
head(toy.gwas.sig)
## chrom snp bp pvalue beta or gene
## 5.18986 5 rs02_25 19843813 8.075e-09 0.5641768 1.758 GENE2178
## 5.19009 5 rs02_38 14907898 1.658e-09 0.6195006 1.858 GENE2178
## 5.19074 5 rs02_74 9657902 7.084e-09 0.6119371 1.844 GENE2179
## 5.19089 5 rs02_83 6887869 4.057e-09 0.5988365 1.820 GENE2179
The main layer of Manhattan plot should be saved in a variable and provided subsequently to ggmanLabel function. The name of the columns with snps and labels has to be supplied. In this case, we will label with SNP identifiers.
## save the main layer in a variable
p1 <- ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE)
##add label
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp")
Annotations can be just text instead of labels. Use the type= argument.
#add text
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", type = "text")
The R package ggrepel is used for annotations. All the arguments that are applicable to geom_text_repel and geom_label_repel can be passed on to ggmanLabel. Lets change the size and colour of the labels.
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", colour = "black", size = 2)
Caution: providing the whole main data frame as labelDfm will fill the entire plot with text or might crash the R if the data frame is too big
The function ggmanHighlight can be used to highlight a single group of points. Be default, while highlighting specific points, the main layer of Manhattan plot is greyed out. We need to supply a vector object with SNP names to highlight. The example file toy.highlights comes along with package.
class(toy.highlights)
## [1] "character"
length(toy.highlights)
## [1] 209
head(toy.highlights)
## [1] "rs02_2" "rs02_7" "rs02_12" "rs02_17" "rs02_22" "rs02_27"
ggmanHighlight(p1, highlight = toy.highlights)
The function ggmanHighlightGroup can be used to highlight multiple groups of points and a legend can be added. Let’s look at the example file toy.highlights.group.
class(toy.highlights.group)
## [1] "data.table" "data.frame"
dim(toy.highlights.group)
## [1] 609 8
head(toy.highlights.group)
## chrom snp bp pvalue beta or gene group
## 1 13 rs06_2_M 24226825 0.0794900 0.18148788 1.199 GENE2180 group2
## 2 13 rs06_7_M 23664350 0.0005127 0.36325326 1.438 GENE2180 group2
## 3 13 rs06_12_M 19042292 0.2111000 0.13627762 1.146 GENE2180 group2
## 4 13 rs06_17_M 24586858 0.0193900 0.27459683 1.316 GENE2180 group2
## 5 13 rs06_22_M 20332216 0.4479000 0.09621886 1.101 GENE2180 group2
## 6 13 rs06_27_M 24855237 1.0000000 0.00000000 1.000 GENE2180 group2
Unlike ggmanHighllight, the function ggmanHighlightGroup requires data.frame as an input. One of the column names should be supplied as a grouping variable. The size of the highlighted points can be changed with size argument. The legend title can be specified with legend.title argument.
ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", size = 0.5, legend.title = "Significant groups")
It is also possible to remove the legend using legend.remove argument.
ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", size = 0.5, legend.remove = TRUE)
The function ggmanZoom can be used to create regional association plot. The chromosome and starting and ending basepair positions should be specified. If only the chromosome is specified, the whole chromosome will be shown. First, let’s see the whole chromosome 1 plot.
ggmanZoom(p1, chromosome = 1)
Next, let’s zoom in to the chromosome 1 region containing genes: GENE21, GENE22 and GENE23.
ggmanZoom(p1, chromosome = 1, start.position = 14209481, end.position = 238131450)
Let’s highlight the genes and add a legend.
ggmanZoom(p1, chromosome = 1, start.position = 14209481, end.position = 238131450, highlight.group = "gene")